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Abstract. Probabilistic Boolean networks (PBNs) is a well-established compu¬ 
tational framework for modelling biological systems. The steady-state dynamics 
of PBNs is of crucial importance in the study of such systems. However, for 
large PBNs, which often arise in systems biology, obtaining the steady-state dis¬ 
tribution poses a significant challenge. In fact, statistical methods for steady-state 
approximation are the only viable means when dealing with large networks. In 
this paper, we revive the two-state Markov chain approach presented in the liter¬ 
ature. We first identify a problem of generating biased results, due to the size of 
the initial sample with which the approach needs to start and we propose a few 
heuristics to avoid such a pitfall. Second, we conduct an extensive experimental 
comparison of the two-state Markov chain approach and another approach based 
on the Skart method and we show that statistically the two-state Markov chain 
has a better performance. Finally, we apply this approach to a large PBN model 
of apoptosis in hepatocytes. 


1 Introduction 

Systems biology aims to study biological systems from a holistic perspective, with 
the goal to provide a comprehensive, system-level understanding of cellular behaviour. 
Proper functioning of a living cell requires a finely-tuned and orchestrated interplay 
of many complex processes. Complex interactions within a biological system lead to 
emergent properties which are crucial for sustaining life. Therefore, understanding the 
machinery of life requires the use of holistic approaches which enable the study of a sys¬ 
tem as a whole, in contrast to the reductionist approach. Computational modelling plays 
a prominent role in the field of systems biology. Construction and analysis of a com¬ 
putational model for some biological process enables the systematisation of available 
biological knowledge, identification of missing biological information, provides formal 
means for understanding and reasoning about the concerted interplay between different 
parts of the model, finally reveals directions for future experimental work which could 
provide data for better understanding of the process under study. 

Unfortunately, computational modelling of biological processes that take place in 
a living cell poses significant challenges with respect to the size of the state-space that 
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needs to be considered. Modelling of certain parts of cellular machinery such as gene 
regulatory networks (GRNs) or signal transduction pathways often leads to dynami¬ 
cal models characterised by huge state-spaces of sizes that surpass the sizes of any 
human-designed systems by orders of magnitude. Therefore, profound understanding 
of biological processes asks for the development of new methods and approaches that 
would provide means for formal analysis and reasoning about such huge systems. 

In this study we concentrate on the analysis of the steady-state dynamics of biologi¬ 
cal processes, in particular GRNs, modelled as discrete-time Markov chains (DTMCs). 
This is the case, for example, when the biological system under study is cast into the 
mathematical/computational framework of probabilistic Boolean networks (PBNs) GEl. 

In these or other discrete-time models, e.g., dynamic Bayesian networks, the real (con¬ 
sidered as continuous) time is not modelled. Instead, the evolution of the system is 
abstracted as a sequence of consecutive events. These coarse-grained models have been 
successfully applied in many systems biology studies and proved their predictive power 0. 
In fact, for the study of large regulatory systems they remain the only reasonable so¬ 
lution. Extrapolating the ordinary differential equations model of a single elementary 
building block of the network (e.g., a gene) to the whole large system would result 
in a prohibitively complex model. However, moving towards a higher-level descrip¬ 
tion by ignoring the molecular details allows to grasp the system-level behaviour of the 
network 0. In consequence, these coarse-grained formalisms are broadly applied in 
systems biology studies of systems where the predictions of exact reaction times are 
not of main interest. For example, this is the case in the study of dynamical attractors 
of a regulatory network, which seem to depend on the circuit wiring rather than kinetic 
constants (such as production rates or interaction rates) 0. In this sense modelling 
biological systems with more abstract, high-level view formalisms has certain unques¬ 
tionable advantages. 

One of the key aspects in the analysis of such dynamic systems is the comprehen¬ 
sion of their steady-state (long-run) behaviour. For example, attractors of such systems 
were hypothesised to characterise cellular phenotypes 0 . Another complementary con¬ 
jecture is that attractors correspond to functional cellular states such as proliferation, 
apoptosis, or differentiation Q. These interpretations may cast new light on the un¬ 
derstanding of cellular homeostasis and cancer progression m . In this work, we focus 
on the computation of steady-state probabilities which are crucial for the determination 
of long-run influences and sensitivities. These are measures that quantify the impact 
of genes on other genes, considered collectively or individually, and that enable the 
identification of elements with highest impact. In this way they provide insight into the 
control mechanisms of the network. 

So far the huge-state space, which often characterises dynamical models of biolog¬ 
ical systems, tempers the application of the above mentioned techniques in the analysis 
of realistic biological systems. In fact, approximations with the use of Markov chain 
Monte Carlo (MCMC) techniques are the only viable solution to this problem 0. How¬ 
ever, due to the difficulties with the assessment of the convergence rate to the steady- 
state distribution (see, e.g., 0), certain care is required when applying these methods 
in practice. A number of statistical methods exists, which allow to empirically deter¬ 
mine when to stop the simulation and output estimates. We employ in our study: (1) the 
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two-state Markov chain approach Qol and (2) the Skart batch-means method of ma. 
The two-state Markov chain approach was introduced in 1992 by Raftery and Lewis; 
and Shmulevich et al. jSl proposed its application to the analysis of PBNs in 2003. 
However, to the best of our knowledge, since then it has not been widely applied for 
the analysis of large PBNs. In this paper, we aim to revive its usage for approximating 
steady-state probabilities of large PBNs, which often arise in computational systems 
biology as models of full-size genetic networks. We identify a problem concerned with 
the choice of the initial sample size in the two-state Markov chain approach. As we 
show, an unconscious choice may lead to biased results. We propose a few heuristics 
to avoid this problem. By extensive experiments, we show that the two-state Markov 
chain approach outperforms the Skart method in most cases, where the batch-means 
Skart method is considered the current state-of-the-art approach. In this way we show 
that the two-state Markov chain is often the optimal choice for the analysis of large 
PBN models of biological systems. 

Structure of the paper. After presenting some preliminaries in Section]^ we describe 
the two-state Markov chain approach in Sectionj^and identify a problem of generating 
biased results, due to the size of the initial sample with which the approach needs to 
start. We then propose some heuristics for the approach to avoid unfortunate initiali¬ 
sations. We perform an extensive evaluation and comparison of the two-state Markov 
chain approach and the Skart method in Section|^on a large number of randomly gener¬ 
ated PBNs. In most cases, the two-state Markov chain approach seems to have a better 
performance in terms of computational cost. Finally, we apply the two-state Markov 
chain approach to study a large PBN model of apoptosis in hepatocytes consisting of 
91 nodes in Section]^ We compute the steady-state influences and long-run sensitivi¬ 
ties and confirm previously formulated hypothesis. We conclude our paper with some 
discussions in Section|6] 


2 Preliminaries 

2.1 Finite discrete-time Markov chains (DTMCs) 

Let 5 be a finite set of states. A (first-order) discrete-time Markov chain is an 5-valued 
stochastic process with the property that the next state is independent of the past 

states given the present state. Formally, P(A)+i = \ Xt = St^Xt-\ = S;_i,...,Xo = 

So) = P(2f?+i = Sf+i I A) = Sr) for allsr+i,sr,... ,so S 5. Here, we consider time-homogenous 
Markov chains, i.e., chains where P(Xr+i = s' |A) = s), denoted is independent of 
t for any states s,s' G S. The transition matrix P — {Pss')ss'eS satisfies P^y ^0 and 
y = 1 for all s G S. We denote by tt a probability distribution on S.lf n — nP, 
then TT is a stationary distribution of the DTMC (also referred to as the invariant distri¬ 
bution). A path of length n is a sequence si —>^ S 2 s„ such that Ps,,i,_|_i > 0 and 

Si G S for i G {1,2,..., n}. State q G S is reachable from state p S 5 if there exists a path 
such that Si = p and s„ = q. A DTMC is irreducible if any two states are reachable 
from each other. The period of a state is defined as the greatest common divisor of the 
lengths of all paths that start and end in the state. A DTMC is aperiodic if all states in 
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S are of period 1. A finite state DTMC is called ergodic if it is irreducible and aperi¬ 
odic. By the famous ergodic theorem for DTMCs ifTSl an ergodic chain has a unique 
stationary distribution being its limiting distribution (also referred to as the steady-state 
distribution given by lim„^«, where no is any initial probability distribution on 
S). In consequence, the limiting distribution for an ergodic chain is independent of the 
choice of Uq. The steady-state distribution can be estimated from any initial distribution 
by iteratively multiplying it by P. 

The evolution of a first-order DTMC can be described by a stochastic recurrence 
sequence where is an independent sequence of uniformly 

distributed real random variables over [0,1] and the transition function (j): 5 X [0,1] —?► 5 
satisfies the property that V{(j){s,U) = s') =Pss' for any states s,s' G S and for any U, 
a real random variable uniformly distributed over [0,1]. When S is partially ordered and 
when the transition function 0 (•, m) is monotonic for all u, then the Markov chain is said 
to be monotone ( II13I14I 1. 

2.2 Probabilistic Boolean networks (PBNs) 

A PBN Giy,^) consists of a set of binary-valued nodes (also known as genes) V = 
{vi,V 2 , ■ ■ ■ and a list of sets = (Ti,T 2 ,... ,/v,). For each i G {1,2,...the set 
Fi = {/i"\/ 2 *\ ■ ■ •)//(/)} is a collection of predictor functions for node v;, where l{i) is 

the number of predictor functions for v;. Each G Fi is a Boolean function defined 
with respect to a subset of nodes referred to as parent nodes of v,. There is a probability 
distribution on each Fi G F: c^j'^ is the probability of selecting G Fi as the next 

predictor for v; and it holds that = 1- We denote by v, (f) the value of node v, at 

time point f G N. The state space of the PBN is 5 = {0,1}” and it is of size 2". The state 
of the PBN at time t is given by s{t) = (vi (f), V 2 (f),..., v„(f)). The dynamics of the PBN 
is given by the sequence (f ))^o- consider here independent PBNs where predictor 
functions for different nodes are selected independently of each other. The transition 
from s{t) to s(f -b 1 ) is conducted by randomly selecting a predictor function for each 
node V, from F) and by synchronously updating the node values in accordance with the 
selected functions. There are N = n"=i ^(0 different ways in which the predictors can 
be selected for all n nodes. These combinations are referred to as realisations of the 
PBN and are represented as n-dimensional function vectors G 

Fi X F 2 X ... X Fn, where k G {1,2,... ,N} and k,- G {1,2,A realization selected 
at time t is referred to as /(f). Due to independence, P(/,t) = = fk) = HLi 

In PBNs with perturbations, a perturbation parameter p G (0,1) is introduced to 
sample the perturbation vector y(f) = (71 (f), 72 (f),..., Yn{t)), where 7 (f) G {0,1} and 
P( 7 (f) = \) — p for all f and iG {1,2,... ,n}. Perturbations provide an alternative way 
to regulate the dynamics of a PBN: the next state is determined as s(f-b 1) =/(f)(s(f)) if 
7 (f) = 0 and as 5(f -b 1 ) = s(f) 0 7 (f) otherwise, where 0 is the exclusive or operator for 
vectors. The perturbations, by the latter update formula, allow the system to move from 
any state to any other state in one single transition, hence render the underlying Markov 
chain irreducible and aperiodic. Therefore, the dynamics of a PBN with perturbations 
can be viewed as an ergodic DTMC lITSl . The transition matrix is given by P^ j = (1 — 
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p)"Lf=i l[/j.(i)=i']IP(/t:) + (1 ~ (1 ~where 1 is the indicator 
function and 77 ( 5 , 5 ') is the hamming distance between states 5 , 5 ' € S. According to 
the ergodic theory, adding perturbations to any PBN assures the long-run dynamics of 
the resulting PBN is governed by a unique limiting distribution, convergence to which 
is independent of the choice of the initial state. However, the perturbation probability 
value should be chosen carefully, not to dilute the behaviour of the original PBN. In this 
way the ‘mathematical trick’, although introduces some noise to the original system, 
allows to significantly simplify the analysis of the steady-state behaviour, which is often 
of interest for biological systems. 

The density of a PBN is measured with its function number and parent nodes num¬ 
ber. For a PBN G, its density is defined as ^(G) = ^ T.fli (o{i), where n is the number 
of nodes in G, Nf is the total number of predictor functions in G, and co{i) is the number 
of parent nodes for the ith predictor function. 

Within the framework of PBNs the concept of influences is defined; it formalizes 
the impact of parents nodes on a target node and enables its quantification ( ll23l '). The 
concept is based on the notion of a partial derivative of a Boolean function / with 
respect to variable xj (I < j <n): 

where 0 is addition modulo 2 (exclusive OR) and for 1 G {0,1} 

= {xi,X2,...,Xj-i,l,Xj+l,...,Xn). 


The influence of node xj on function f is the expected value of the partial derivative 
with respect to the probability distribution D{x): 


/;(/)= Ed 


- df{x) - 

- dxj . 


= p{^ = 1} =P{/(x(^’‘’^) 


Let now Ft be the set of predictors for x, with corresponding probabilities ^ for j = 

!,...,/(/) and let 4(/j*^) be the influence of nodex^t on the predictor function fj‘\ Then, 
the influence of node xj^ on node x, is defined as: 


i(i) 


4(x,) = £4(/; 




The long-term influences are the influences computed when the distribution D{x) is the 
stead-state distribution of the PBN. 

We define and consider in this study two types of long-run sensitivities. 


Definition 1. The long-run sensitivity with respect to selection probability perturbation 
is defined as 


sc[cf =p] = \\n[cf =p]-7z\\i, 
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where || • ||/ denotes the I-norm, 7t is the steady-state distribution of the original PBN, 
p G [0,1] is the new value for c^‘\ and 7t[cJ^ = p] is the steady-state probability dis¬ 
tribution of the PBN perturbed as follows. The jth selection probability for node Xi is 
replaced with c^^ = p and all selection probabilities for k G I-j = — 

1, J + 1,..., Z(/)} are replaced with 




P)- 







The remaining selection probabilities of the original PBN are unchanged. 


Definition 2. The long-run sensitivity with respect to permanent on/off perturbations 
of a node Xi as 


■5j?N =max{||jf[;c,-EE0]-;/r||/,||7f[x;= l]-:/r||/}, 

where %, = 0], and = 1] are the steady-state probability distributions of the 

original PBN, of the original PBN with all G Fi replaced by f^^ = 0, and all G Fi 
replaced by f^^ = 1, respectively. 

Notice that the dehnition of long-run sensitivity with respect to permanent on/off 
perturbations is similar but not equivalent to the definition of long-run sensitivity with 
respect to 1-gene function perturbation of Il23l . 

3 The Two-state Markov Chain Approach 

3.1 Description 

We recall the two-state Markov chain approach originally introduced in ifTOl . The two- 
state Markov chain approach is a method for estimating the steady-state probability of 
a subset of states of a DTMC. In this approach the state space of an arbitrary DTMC is 
split into two disjoint sets, referred to as meta states. One of the meta states, numbered 
1, is the subset of interest and the other, numbered 0, is its complement. The steady- 
state probability of meta state 1, denoted q, can be estimated by performing simulations 
of the original Markov chain. For this purpose a two-state Markov chain abstraction of 
the original DTMC is considered. Let {Zt}t;>o be ^ family of binary random variables, 
where Z, is the number of the meta state the original Markov chain is in at time t. 
{Z,}t;>o is a binary (0-1) stochastic process, but in general it is not a Markov chain. 
However, as argued in Co), a reasonable assumption is that the dependency in {Z,}f^o 
falls off rapidly with lag. Therefore, a new process where z/^^ = Zi_|_(,_i)^, 

will be approximately a first-order Markov chain for k large enough. A procedure for 
determining appropriate k is given in ifTOll . The first-order Markov chain consists of the 
two meta states with transition probabilities a and j3 between them. See Figure for 
a conceptual illustration of the construction of this abstraction. 

The steady-state probability estimate q is computed from a simulated trajectory of 
the original DTMC. The key point is to determine the optimal length of the trajectory. 
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(a) Original DTMC 


(b) Two-state DTMC 


Fig. 1; Conceptual illustration of the idea of the two-state Markov chain construction, 
(a) The state space of the original discrete-time Markov chain is split into two meta 
states: states A and B form meta state 0, while states D, C, and E form meta state 1. The 
split of the state space into meta states is marked with dashed ellipses, (b) Projecting the 
behaviour of the original chain on the two meta states results in a binary (0-1) stochastic 
process. After potential subsampling, it can be approximated as a first-order, two-state 
Markov chain with the transition probabilities a and j3 set appropriately. 

Two requirements are imposed. First, the abstraction of the DTMC, i.e., the two-state 
Markov chain, should converge close to its steady-state distribution n = [lU,) n{\. For¬ 
mally, t satisfying = i\ZQ^^ = j] — 7r,j < e for a given e > 0 and all i,j € {0,1} 

needs to be determined, f is the so-called ‘burn-in’ period and determines the part of 
the trajectory of the two-state Markov chain that needs to be discarded. Second, the 
estimate q is required to satisfy P[^ — r^q^q-\-r\ ^ s, where r is the required preci¬ 
sion and s is a specified confidence level. This condition is used to determine the length 
of the second part of the trajectory used to compute q, i.e., the sample size. Now, the 
total required trajectory length of the original DTMC is then given hy M +N, where 
M — 1 -f (f — l)k and N = 1 + ([n(a, j3)] — l)k, where t = [;«(«, j3)]. The functions m 
and n depend on the transitions probabilities a and j3 and are given by 



where '!> ^ is the inverse of the standard normal cumulative distribution function. The 
expressions for m and n were originally presented in Go) . Derivations however were not 
provided and the expressions contain two oversights: in the formula for m the absolute 
value is missing in the denominator and in the formula for n the inverse of <I> should be 
used instead of <I>. We provide detailed derivations of the expressions for m and n in the 
Appendices [A|and[Bj respectively. 

Since a and j3 are unknown, they need to be estimated. This is achieved iteratively 
in the two-state Markov chain approach of Go). It starts with sampling an arbitrary 
initial length trajectory, which is then used for estimating the values of a and j3. M and 
N are calculated based on these estimates. Next, the trajectory is extended to reach the 
required length, and a and p values are re-estimated. The new estimates are used to 
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re-calculate M and N. This process is iterated until M + A^ is smaller than the current 
trajectory length. Finally, the resulting trajectory is used to estimate the steady-state 
probability of meta state 1. For more details, see m. 

The two-state Markov chain approach can be viewed as an aggregation method for 
reducing the state space. Genetically, aggregation methods aggregate groups of nodes 
in the original Markov chain in accordance with a given partition function, which leads 
to a smaller transition graph. In principle, the aggregation can be any Markov chain over 
this smaller transition system. What remains is the choice of the specific aggregation 
which is determined by the choice of the transition probabilities. The partition function 
is also used to obtain the so-called projection of the original process, i.e., the realisa¬ 
tion of the original Markov chain is projected through the partition function. Ideally, 
the aggregated chain and the projected realisation should coincide. However, since the 
projection is in general not Markovian, the aggregation which is “closest” to the pro¬ 
jection is considered instead and the “closeness” has to be appropriately defined. The 
problem of finding the optimal partition function in the case where the distance between 
the projection and the aggregation is quantified by the Kullback-Leibler divergence rate 
(KLDR) has been recently studied, see, e.g., 11161171181 . The focus in these works is on 
finding the optimal partition function, for which the KLDR distance between the pro¬ 
jection and aggregation is minimised. In our case the partition function which classifies 
the original states into the two meta states is specified by the biological question under 
study. As shown in mi and mi, for a given partition function, the aggregation clos¬ 
est to the projection can be analytically obtained provided the steady-state distribution 
of the original chain is available. The steady-state probabilities are however our goal, 
thus these techniques cannot be exploited for the determination of a and p transition 
probabilities. The iterative, statistical estimation of the transition probabilities for the 
aggregation remains the only viable solution. 

3.2 The choice of the initial sample size 

Given good estimates of a and )3, the theory of the two-state Markov chain presented 
above guarantees that the obtained value satisfies the imposed precision requirements. 
However, the two-state Markov chain approach starts with generating a trajectory of the 
original DTMC of an arbitrarily chosen initial length, i.e., Mq -f Aq = 1 -f (mo — l)k-|- 
1 + (no — 1)^. where mo it the ‘burn-in’ period and no is the sample size of the two-state 
Markov chain abstraction. An unfortunate choice may lead to first estimates of a and 
P that are biased and result in the new values of M and N such that M + N is either 
smaller or not much larger than the initial Mq +Nq. In the former case the algorithm 
stops immediately with the biased values for a, P and, more importantly, with an es¬ 
timate for the steady-state probability that does not satisfy the precision requirements. 
The second case may lead to the same problem. As an example we considered a two- 
state Markov chain with a = (0.0020214) and P = ^ (0.96). The steady-state 

probability distribution was [0.997899 0.002101]. With k — 1, e = 10^®, r = 10^^, 
s = 0.95, niQ = 5, and hq = 1,920 the first estimated values for a and p were 
(0.0005214) and 1, respectively. This subsequently led to M = 2 and N = 1,999, re¬ 
sulting in a request for the extension of the trajectory by 76. After the extension, the 
new estimates for a and p were and 1, respectively. These estimates gave M = 2, 
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N = 1, 920, and the algorithm stopped. The estimated steady-state probability distri¬ 
bution was [0.99950 0.00050], which was outside the pre-specified precision interval 
given by r. Independently repeating the estimation for 10“* times resulted in estimates 
of the steady-state probabilities that were outside the pre-specified precision interval 
90% of times. Given the rather large number of repetitions, it can be concluded that the 
specified 95% confidence interval was not reached in this case. 

The reason for the biased result is the unfortunate initial value for np and the fact that 
the real value of a is small. In the initialisation phase the value of a is underestimated 
and [n(a,j3)] calculated based on the estimated values of a and j3 is almost the same 
as no. Hence, subsequent extension of the trajectory does not provide any improvement 
to the underestimated value of a since the elongation is short and the algorithm halts 
after the next iteration. 

To identify and avoid some of such pitfalls, we consider a number of cases and for¬ 
mulate some of the conditions in which the algorithm may fail to achieve the specified 
precision. Let np be the initial size of the sample used for initial estimation of a and 
j3. We assume that neither a nor j3 is zero. Then, the smallest possible estimates for 
both a and j3 are greater than Let us set an upper bound value for np to be 10^. 
For most cases this boundary value is reasonable and we expect np to be much smaller. 
Notice however that this is the case only if the real values of a and j3 are larger than 
10^“*. We just mention here that in general the selection of a proper value for np heavily 
depends on the real values of a and j3, which are unknown a priori. From what was 
stated above, it follows that both first estimates for a and j3 are greater than 10^“*. The 
following cases are possible. 

- If both a and j3 are small, e.g., less than 0.1, then we have that 10 “* < a,j3 < 0.1 
and n(a,j3) > 72,765 as can be seen by investigating the n(-,-) function. In this 
case the sample size is increased more than 7-fold which is reasonable since the 
two-state Markov chain seems to be bad-mixing by the first estimates of the values 
for a and j3 and the algorithm asks for a significant increase of the sample size. We 
therefore conclude that the bad-mixing Markov chain case can be properly handled 
by the algorithm. 

- Both first estimates of a and j3 are close to 1. If a, j3 € [0.7,0.98], the value of 
n{a,j5) is larger than 19,000. If both a,j3 > 0.98 than the size of the sample drops, 
but in this case the Markov chain is highly well-mixing and short trajectories are 
expected to provide good estimates. 

- The situation is somewhat different if one of the parameters is estimated to be 
small and the other is close to 1 as in the example described above. The extension 
to the trajectory is too small to significantly change the estimated value of the small 
parameter and the algorithm halts. 

Considering the above cases leads us to the observation that the following situation 
needs to be treated with care: The estimated value for one of the parameters is close to 
the value of the second parameter is close to 1, and n{a,j5) is either smaller or not 
significantly larger than np. 

First approach: pitfall avoidance. In order to avoid this situation, we determine np 
which in principle could lead to initial inaccurate estimates of a or j3 and such that the 
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r 

0.01 

0.001 

0.0001 

s 

0.9 

0.95 

0.975 

0.9 

0.95 

0.975 

0.9 

0.95 

0.975 

no e 

0 

[2,136] 

0 

[2,1161] 

[2,1383] 

[2,1582] 

[2,11628] 

[2,13857] 

]2,15847] 


Table 1; Ranges of integer values for no that do not satisfy the ‘critical’ condition 
n{a,j5) < 2 no for the given values of r and i. 


next sample size given by \n{a,j5)~\ would practically not allow for an improvement of 
the estimates. We reason as follows. As stated above, the ‘critical’ situation may take 
place when one of the parameters is estimated to be very small, i.e., close to and 
the increase in the sample size is not signihcant enough to improve the estimate. If the 
initial estimate is very small, the real value is most probably also small, but the estimate 
is not accurate. If the value is underestimated to the lowest possible value, i.e., on 
average the improvement can take place only if the sample size is increased at least by 
no. Therefore, with the trade-off between the accuracy and efficiency of the method in 
mind, we propose the sample size to be increased at least by no. Therefore, the ‘critical’ 
situation condition is n(a,j3) < 2no. By analysing the function n(-,-) as described in 
details in Appendix [P] we can determine the values of no that are ‘safe’, i.e., which do 
not satisfy the ‘critical’ condition. We present them in Table [T] for a number of values 
for r and s. 

Second approach: controlled initial estimation of a and j5. The formula for n is 
asymptotically valid for a two-state Markov chain provided that the values for a and j3 
are known. However, these values are not known a priori and they need to be estimated. 
Unfortunately, the original approach does not provide any control over the quality of the 
initial estimate of the values of these parameters. In certain situation, e.g., as in the case 
discussed above, the lack of such a control mechanism may lead to results with worse 
statistical conhdence level than the specified one given by s. In the discussed example 
s = 95%, but this value was not reached in the performed experiment. In order to address 
this problem, we propose to extend the initial phase of the two-state approach algorithm 
in the following way. The algorithm samples a trajectory of the given Markov chain and 
estimates the values of a and p. It might be the case that an arbitrarily chosen initial 
sample size is not big enough to provide non-zero estimates for the two parameters. 
If this is the case, the initial sample size is doubled and the trajectory is elongated to 
collect a sample of required size. This is repeated iteratively until non-zero estimates 
for both a and j3 are obtained. We introduce the following notation; a and j3 are the 
non-zero estimates of the values of a and j3, respectively. Furthermore, let no be the 
sample size used to obtain hrst non-zero estimates a and p, i.e., no is either the initial 
sample size or it is two to power some multiple of the initial sample size. 

Once the non-zero estimates are available, the algorithm computes the sample size 
required to reach the s conhdence level that the true value of min(a,)3) is within a cer¬ 
tain interval. For dehniteness, let us assume from now on that a < p, which suggests 
that min(a,)3) = a. During the execution of the procedure outlined in the following 
the inequality may be inverted. If this is the case, the algorithm makes corresponding 
change in the consideration of a and p. 
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The aim is to have a good estimate for a. Notice that the smallest possible initial 
value of a is greater than We refer to ^ as the resolution of estimation. Given the 
resolution, one cannot distinguish between values of a in the interval (a — ^, a -I- ^). 
In consequence, if a G (a—+ then the estimated value a should be con¬ 
sidered as optimal. Hence, one could use this interval as the one which should contain 
the real value with specified confidence level. Nevertheless, although the choice of this 
interval usually leads to very good results, as experimentally verified, the results are 
obtained at the cost of large samples which make the algorithm stop immediately after 
the initialisation phase. Consequently, the computational burden is larger than would 
be required by the original algorithm to reach the desired precision specified by r and 
s parameters in most cases. In order to reduce this unnecessary overhead, we consider 
the interval (d — ^, d + f), which is wider than the previous one whenever d > ^ and 
leads to smaller sample sizes. 

The two-state Markov chain consists of two states 0 and 1, i.e., the two meta states 
of the original DTMC. We set a as the probability of making the transition from state 0 
to state 1 (denoted as 0 —> 1). The estimate d is computed as the ratio of the number of 
transitions from state 0 to state 1 to the number of transition from state 0. Let np.a be 
the number of transitions in the sample starting from state 0. Let Xi,i= 1,2,..., no,a, be 
a random variable defined as follows: is 1 if /th transition from meta-state 0 is 0 —> 1 
and 0 otherwise. 


Notice that state 0 is an accessible atom in the terminology of the theory of Markov 
chains, i.e., the Markov chain regenerates after entering state 0, and hence the ran¬ 
dom variables Xi, i = 1,2,...,no.a, are independent. They are Bernoulli distributed 
with parameter a. The unbiased estimate of the population variance from the sam¬ 
ple, denoted d^, is given by = d • (1 — d) • ^ . Due to independence, d^ is also 

the asymptotic variance and, in consequence, the sample size that provides the spec¬ 


ified confidence level for the estimate of the value of a is given by na,i(^)«0,a) = 


The Markov chain is in state 0 with steady-state probability . Then, given that 
the chain reached the steady-state distribution, the expected number of regenerations, 
i.e., returns to meta state 0, in a sample of size n is given by Therefore, the sample 
size used to estimate the value of a with the specified confidence level s is given by 
• na,i(d,no,a)- As the real values of a and j3 are unknown, the estimated 

values d and j3 can be used in the above formula. If the computed «« is bigger than 
the current number of transitions n^ a, we extend the trajectory to reach transitions 
from 0 to 1 and re-estimate the values for a and j3 using the extended trajectory. We 
repeat this process until the computed value is smaller than the number of transitions 
used to estimate a. In this way, good initial estimates for a and j3 are obtained and the 
original two-state Markov chain approach using the formula for n(a,l5) is run. 


Third approach: simple heuristics. When performing the initial estimation of a and 
j3, we require both the count of transitions from meta state 0 to meta state 1 and the 
count of transitions from meta-state 1 to meta state 0 be at least 3. If this condition is not 
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satisfied, we proceed by doubling the length of the trajectory. In this way the problem of 
reaching the resolution boundary is avoided. Our experiments showed that this simple 
approach in many cases led to good initial estimates of the a and j3 probabilities. 


4 Evaluation 


We implemented the two-state Markov chain approach with the simple heuristics pre¬ 
sented in Section!^ and the Skart method of lITTl in the tool ASSA-PBN, which was 
specially designed for steady-state analysis of large PBNs mi (see Section |2.2| for 
the theoretical background of PBNs). We verihed with experiments that with use of 
the simple heuristics, the two-state Markov chain approach could meet the predefined 
precision requirement even in the case of an unlucky initial sample size. For the steady- 
state analysis of large PBNs, applications of these two methods necessitate generation 
of trajectories of significant length. To achieve this in an efficient way, we applied the 
alias method ll20l to sample the consecutive trajectory state. This enables ASSA-PBN, 
for example, to simulate 4,800 steps within Is for a 2,000 nodes PBN (state-space of 
size 

We choose the Skart method HD as a reference for the evaluation of the per¬ 
formance of the two-state Markov chain approach. The Skart method is a successor 
of ASAP3, WASSP, and SBatch methods, which are all based on the idea of batch 
means HD- It is a procedure for on-the-fly statistical analysis of the simulation out¬ 
put, asymptotically generated in accordance with a steady-state distribution. Usually 
it requires an initial sample of size smaller than other established simulation analysis 
procedures HD- In a brief, high-level summary, the algorithm partitions a long simula¬ 
tion trajectory into batches, for each batch computes a mean and constructs an interval 
estimate using the batch means. Further, the interval estimate is used by Skart to decide 
whether a steady state distribution is reached or more samples are required. For a more 
detailed description of this method, see HD- 

The Skart method differs in two key points with the two-state Markov chain ap¬ 
proach. First, it specifies the initial trajectory length to be at least 1,280, while for the 
two-state Markov chain approach this information is not provided. Second, the Skart 
method applies the student distribution for skewness adjustment while the two-state 
Markov chain approach makes use of the normal distribution for confidence interval 
calculations. 

To compare the performance of the two methods, we randomly generated 882 differ¬ 
ent PBNs using ASSA-PBN. ASSA-PBN can randomly generate a PBN which satisfies 
structure requirements given in the form of hve input parameters; the node number, the 
minimum and the maximum number of predictor functions per node, finally the min¬ 
imum and maximum number of parent nodes for a predictor function. We generated 
PBNs with node numbers from the set {15,30,80,100,150,200,300,400,500,1000, 
2000}. We assigned the obtained PBNs into three different classes with respect to the 
density measure S’: dense models with density 150-300, sparse models with density 
around 10, and in-between models with density 50-100. The two-state Markov chain 
approach and the Skart method were tested on these PBNs with precision r set to the 
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k 

0 

5 

10 

15 

20 

25 

30 

trS < tSkart 

69.83% 

55.10% 

41.02% 

31.03% 

25.86% 

22.76% 

20.05% 

tSkart < trs 

30.38% 

18.68% 

11.10% 

7.39% 

5.49% 

4.43% 

3.74% 


Table 2: Performance comparison of the Skart and the two-state MC methods. Expla¬ 
nations in the text. 


node 

number 

method 

the two-state Markov chain 

Skart 

precision 

0.01 

0.005 

0.001 

0.01 

0.005 

0.001 

1,000 

trajectory size 

35,066 

133,803 

3,402,637 

37,999 

139,672 

3,272,940 

time cost (s) 

6.19 

23.53 

616.26 

7.02 

24.39 

590.26 

2,000 

trajectory size 

64,057 

240,662 

5,978,309 

63,674 

273,942 

5,936,060 

time cost (s) 

20.42 

67.60 

1722.86 

20.65 

78.53 

1761.05 


Table 3: Approximate steady-state analysis of two large PBNs. 


values in {10-^5 X 10-^10-^5x IQ-'^, 10-'^,8 x 10-^5 x lO^^}. We set e to 10-'“ 
for the two-state Markov chain approach and s to 0.95 for both methods. 

The experiments were performed on a HPC cluster, with CPU speed ranging be¬ 
tween 2.2GHz and 3.07GHz. ASSA-PBN is implemented in Java and the initial and 
maximum Java virtual machine heap size were set to 503MB and 7.86GB, respectively. 
We collected 5263 results with the information on the PBN node number, its density 
class, the precision value, the estimated steady-state probabilities computed by the two 
methods, and their CPU time costs. The steady-state probabilities computed by the two 
methods are comparable in all the cases (data not shown in the paper). For each exper¬ 
imental result /, we compare the time costs of the two methods. Let frs(i) and tskart{i) 
be the time cost for the two-state Markov chain approach and the Skart method, respec¬ 
tively. We say that the two-state Markov chain approach is by k per cent faster than the 
Skart method if ^ ^ -pj^g jehnition for the Skart method to be faster 

than the two-state Markov chain approach is symmetric. In Tablewe show the per¬ 
centage of cases in which the two-state Markov chain approach was by k per cent faster 
than Skart and vice versa for different k. In general, in about 70% of the results, the 
two-state Markov chain was faster than Skart. It is also clear that the number of cases 
the two-state Markov chain approach was faster than Skart is larger than in the opposite 
case. 

We show in Table the trajectory sizes and the time costs for computing steady- 
state probabilities of two large PBNs using the two-state Markov chain approach and 
the Skart method for different precision requirements. The two analysed PBNs consist 
of 1,000 and 2,000 nodes, which give rise to state spaces of sizes exceeding 10 ^““ and 
10600 , respectively. 

5 A Biological Case study 

A multicellular organism consists of cells that form a highly organised community. The 
number of cells in this system is tightly controlled by mechanisms that regulate the cell 































14 


Mizera, Pang and Yuan 


division and the cell death. One of these mechanisms is the programmed cell death, also 
referred to as apoptosis: if cells are damaged, infected, or no longer needed, the intra¬ 
cellular death program is activated, which leads to fragmentation of the DNA, shrinkage 
of the cytoplasm, membrane changes and cell death without lysis or damage to neigh¬ 
bouring cells. This process is regulated by a number of signaling pathways which are 
extensively linked by cross-talk interactions. In BTl . a large-scale Boolean network 
of apoptosis in hepatocytes was introduced, where the assigned Boolean interactions 
for each molecule were derived from literature study. In fT2\ . the original multi-value 
Boolean model was cast into the PBN framework: a binary PBN model, so-called ‘ex¬ 
tended apoptosis model’ which comprised 91 nodes (state-space of size 2®*) and 102 
interactions was constructed, see Figure in Appendix for the wiring of the PBN 
model. With respect to the original multi-value Boolean model of ETIl . the PBN model 
was extended as described in ll22l . For example, the possibility of activation of NF-fcB 
through Caspase 8 (C8*) was included. The model was fitted to steady-state experimen¬ 
tal data obtained in response to six different stimulations of the input nodes, see ll22l 
for details. 

As can be seen from the wiring of the network, the activation of complex2 (co2) by 
RIP-deubi can take place in two ways: 1) by a positive feedback loop from activated 
C8* and P -4 tBid — !• Bax smac —RIP-deubi —co2 C8*-co2 C8*, and 2) by 
the positive signal from UV-B irradiation (input nodes UV(1) or UV(2)) Bax —> 
smac — > RIP-deubi —co2. The former to be active requires the stimulation of the type 
2 receptor (T2R). The latter way requires complexl (col) to be active, which cannot 
happen without the stimulation of the TNF receptor-1. Therefore, RIP-deubi can acti¬ 
vate co2 only in the condition of co-stimulation by TNF and either UV(1) or UV(2). In 
consequence, it was suggested in ll22ll that the interaction of activation of co2 via RIP- 
deubi is not relevant and could be removed from the model in the context of modelling 
primary hepatocyte. However, due to the problem with efficient generation of very long 
trajectories in optPBN toolbox, quantitative analysis was hindered and this hypothesis 
could not be verified (1221). 

In this work, we take up this challenge and we quantitatively investigate the rele¬ 
vancy of the interaction of activation of co2 via RIP-deubi. We perform an extensive 
analysis in the context of co-stimulation by TNF and either UV(1) or UV(2): we com¬ 
pute long-term influences of parent nodes on the co2 node and the long-run sensitivities 
with respect to various perturbations related to specific predictor functions and their 
selection probabilities. For this purpose we apply the two-state Markov chain approach 
as implemented in our ASSA-PBN tool ifT^ to compute the relevant steady-state prob¬ 
abilities for the best-fit models described in ll22ll . Due to the efficient implementation, 
the ASSA-PBN tool can easily deal with trajectories of length exceeding 2 x 10‘^ for 
this case study. 

We consider 20 distinct parameter sets of Ea that resulted in the best fit of the 
‘extended apoptosis model’ to the steady-state experimental data in six different stimu¬ 
lation conditions. In 112^ . parameter estimation was performed with steady-state mea¬ 
surements for the nodes apoptosis, C3apl7 or C3apl7_2 depending on the stimulation 
condition considered, and N F-yB. The optimisation procedure used was Particle Swarm 
and fit score function considered was the sum of squared errors of prediction (SSE) and 
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TNF and UV(1) 

TNF and UV(2) 


^RIP-deubi 

^col 

fpADD 

^RIP-deubi 

^col 

fpADD 

Best fit 

0.2614 

0.9981 

0.9935 

0.2615 

0.9980 

0.9936 

Min 

0.0000 

0.9979 

0.9935 

0.0000 

0.9979 

0.9936 

Max 

0.3145 

0.9988 

0.9944 

0.3146 

0.9990 

0.9947 

Mean 

0.2087 

0.9982 

0.9937 

0.2088 

0.9982 

0.9938 

Std 

0.0735 

0.0002 

0.0002 

0.0735 

0.0002 

0.0003 


Table 4: Long-term influences of RIP-duebi, col, and FADD on co2 in the ‘extended 
apoptosis model’ in ll22l under the co-stimulation of both TNF and UV(1) or UV(2). 


the sum was taken over the three nodes in the six stimulation conditions. We took all 
the optimisation results from the three independent parameter estimation runs of ll22ll . 
each containing 7500 parameter sets. We sorted them increasingly with respect to the 
cost function value obtained during optimisation, removed duplicates, and finally took 
the first 20 best-fit parameter sets. 

As mentioned above, we fix the experimental context to co-stimulation of TNF and 
either UV(1) or UV(2). We note that originally in ll^ UV-B itTadiation conditions 
were imposed via a multi-value input node UV which could take on three values, i.e., 
0 (no itTadiation), 1 (300J/m^ UV-B irradiation), and 2 (6007/m^ UV-B itTadiation). 
In the model of ||22l, UV input node was refined as UV(1) and UV(2) in order to cast 
the original model into the binary PBN framework. Therefore, we consider in our study 
two cases: 1) co-stimulation of TNF and UV(1) and 2) co-stimulation of TNF and 
UV(2). Node co2 has two independent predictor functions: co2 = col A FADD or co2 = 
col A FADD A RIP-deubi. The selection probabilities are denoted as and 
respectively. Their values have been optimised in Il22l . 


We start with computing the influences with respect to the steady-state distribu¬ 
tion, i.e., the long-term influences on co2 of each of its parent nodes: RIP-deubi, col, 
and FADD, in accordance with the definition in Section 2.2 Notice that the computa¬ 
tion of the three influences requires several joint steady-state probabilities to be esti¬ 
mated with the two-state Markov chain approach, e.g., (col=l,FADD=l,RIP-deubi=0) 
or (col=l,FADD=0). Each probability determines a specific split of the original Markov 
chain. For example, in the case of the estimation of the joint steady-state probabil¬ 
ity for (col=l,FADD=0), the states of the underlying Markov chain of the apoptosis 
PBN model in which col=l and FADD=0 constitute meta state 1 and all the remain¬ 
ing states form meta state 0. Therefore, the estimation of influences is computationally 
demanding. The summarised results for the 20 parameter sets are presented for the co¬ 
stimulation of TN F and UV(1) or TN F and UV(2) in Table|^ They are consistent across 
the different parameter sets and clearly indicate that the influence of RIP-deubi on co2 
is small compared to the influence of col or FADD on co2. However, the influence of 
RIP-deubi is not negligible. 


We take the analysis of the importance of the interaction between Rl P-deubi and co2 
further and we compute various long-run sensitivities with respect to selection probabil¬ 
ity perturbation. In particular, we perturb the selection probability i5%, i.e.. 
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TNF and UV(1) 

TNF and UV(2) 

^2 

±5% 

-5% 

= 0 

±5% 

-5% 

= 0 

Best fit 

0.0003 

0.0002 

0.0011 

0.0002 

0.0004 

0.0011 

Min 

0.0002 

0.0002 

0.0003 

0.0002 

0.0002 

0.0002 

Max 

0.0008 

0.0008 

0.0014 

0.0012 

0.0007 

0.0013 

Mean 

0.0005 

0.0005 

0.0009 

0.0004 

0.0004 

0.0009 

Std 

0.0001 

0.0001 

0.0003 

0.0002 

0.0001 

0.0003 


Table 5: Long-ran sensitivities w.r.t selection probability perturbations for the ‘extended 
apoptosis model’ of ll22l under the co-stimulation of TNF and UV(1) or TNF and 
UV(2). 


we set the new value by multiplying the original value by (1 ±0.05), and compute in 
line with Definition[^how the joint steady-state distribution for (apoptosis,C3apl7,NF- 
yB) differs from the non-perturbed one with respect to the /i norm, i.e., 11 • 11 1 . We notice 
that the computation of the full steady-state distribution for the considered PBN model 
of apoptosis is practically intractable, i.e., it would require the estimation of 2®* val¬ 
ues. Therefore, we restrict the computations to the estimation of eight joint steady-state 
probabilities for all possible combinations of values for (apoptosis,C3apl7,N FtcB), i.e., 
the experimentally measured nodes. Each estimation is obtained by a separate run of 
the two-state Markov chain approach with the split into meta states determined by the 
considered probability as explained above in the case of the computation of long-term 
influences. To compare the estimated distributions we choose the l\ norm after ll24ll . 
where it is used in the computations of similar types of sensitivities for PBNs to these 
defined in Section 2.2 Notice that the /] norm of the difference of two probability distri¬ 
butions on a finite sample space is twice the total variation distance. The latter is a well- 
established metric for measuring the distance between probability distributions defined 
as the maximum difference between the probabilities assigned to a single event by the 


two distributions (see, e.g., 123 ). Additionally, we check the difference when 
set to 0 (and, in consequence, is set to 1). The obtained results for the 20 parameter 
sets in the conditions of co-stimulation of TNF and UV(1) and co-stimulation of TNF 
and U V(2) are summarised in Table|^ In all these cases, the sensitivities are very small. 
Therefore, the system turns to be insensitive to small perturbations of the value of . 
Also the complete removal of the second predictor function for co2 does not cause any 
drastic changes in the joint steady-state distribution for (apoptosis,C3apl7,NF-ffB). 

Finally, we compute the long-run sensitivity with respect to permanent on/off pertur¬ 
bations of the node RIP-deubi in accordance with Definition|^ As before, we consider 
the joint steady-state distributions for (apoptosis,C3apl7,NF-K'B) and we choose the 
(i-norm. The results, given in Table show that in both variants of UV-B irradiation 
the sensitivities are not negligible and the permanent on/off perturbations of RIP-deubi 
have impact on the steady-state distribution. 

To conclude, all the obtained results indicate that in the context of co-stimulation 
of TNF and either UV(1) or UV(2) the interaction between RIP-deubi and co2 plays 
a certain role. Although the elimination of the interaction does not invoke significant 





















Reviving the Two-state Markov Chain Approach 


17 


RIP-deubi f. pert. 

Best fit 

Min 

Max 

Mean 

Std 

TNF&UV(1) 

0.3075 

0.0130 

0.3595 

0.2089 

0.0823 

TNF&UV(2) 

0.3097 

0.0105 

0.3612 

0.2105 

0.0827 


Table 6; Long-run sensitivities w.r.t permanent on/off perturbations of RIP-deubi for 
the ‘extended apoptosis model’ of i22\ . 


changes to the considered joint steady-state distribution, the long-term influence of RIP- 
deubi on co2 is not negligible and may be important for other nodes in the network other 
than apoptosis, nodeC3apl7, or NF-kB. 

6 Discussion and Conclusion 

In this paper, we focused on two statistical methods for estimating steady-state prob¬ 
abilities of large PBNs; the two-state Markov chain approach and the Skart method. 
The Skart method follows a continuous development mi, while the two-state Markov 
chain approach was originally introduced by Raftery and Lewis in 1992, and only re¬ 
cently it was explored for the analysis of a relatively large PBN model in Eli . To revive 
the application of the two-state Markov chain approach, we propose a few heuristics to 
avoid a problem with the size of the initial sample which can lead to biased results. By 
extensive experiments, we show that the two-state Markov chain approach outperforms 
the Skart method in most cases. In the end, we illustrated the usability of the two-state 
Markov chain approach on a realistic biological system. 

Our work in the current paper is closely related to statistical model checking II26I27L 
a simulation-based approach using hypothesis testing to infer whether a stochastic sys¬ 
tem satisfies a property. Most current tools for statistical model checking are restricted 
for bounded properties which can be checked on finite executions of the system. In 
recent year, both the Skart method and the perfect simulation algorithm have been 
explored for statistical model checking of steady state and unbounded until proper¬ 
ties 1281291 . which was considered as a future step of statistical model checking 
The perfect simulation algorithm for sampling the steady-state of an ergodic DTMC is 
based on the indigenous idea of the backward coupling scheme originally proposed by 
Propp and Wilson in ifTSll . It allows to draw independent samples which are distributed 
exactly in accordance with the steady-state distribution of a DTMC. However, due to the 
nature of this method, each state in the state space needs to be considered at each step of 
the coupling scheme. Of course, a special, more efficient variant of this method exists. If 
a DTMC is monotone, then it is possible to sample from the steady-state distribution by 
considering the maximal and minimal states only IfTM . For example, this approach 
was exploited in ll28l for model checking large queuing networks. Unfortunately, it is 
not applicable in the case of PBNs with perturbations. In consequence, the perfect sim¬ 
ulation algorithm is only suited for at most medium-size PBNs and large-size PBNs 
are out of its scope. Thus, in this paper we have only compared the performance of the 
two-state Markov chain approach with the Skart method. 

Moreover, in this study we have identified a problem of generating biased results 
by the original two-state Markov chain approach and have proposed three heuristics 
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to avoid wrong initialisation. Finally, we demonstrated the potential of the two-state 
Markov chain approach on a study of a large, 91-node PEN model of apoptosis in hep- 
atocytes. The two-state Markov chain approach facilitated the quantitative analysis of 
the large network and the investigation of a previously formulated hypothesis regard¬ 
ing the relevance of the interaction of activation of co2 via RIP-deubi. In the future, 
we aim to investigate the usage of the discussed statistical methods for approximate 
steady-state analysis in a wide project on systems biology. For instance, we will fur¬ 
ther apply them to develop new techniques for minimal structural interventions to alter 
steady-state probabilities, which will enable the synthesis of optimal control strategies 
for large regulatory networks. 

Acknowledgment. Experiments presented in this paper were carried out using the FIPC 
facilities of the University of Luxembourg ll3Tll (http: //hpc. uni . lu). 
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A Derivation of the number of “burn-in” iterations 


p = 


Let {Zt}t>o be a discrete-time two-state Markov chain as given in Figure lb Z? has 
the value 0 or 1 if the system is in state 0 or state 1 at time n, respectively. The tran¬ 
sition probabilities satisfy 0 < a, j3 < 1 and the transition matrix for this chain has the 
following form 

1 - a a 

p l-p 

Matrix P has two distinct eigenvalues: 1 and A = (l — a — j3). Notice that |A| < 1. 

The chain is ergodic and the unique steady-state distribution is n — {kq n\\ = 
denote the expected value of Z, for any fixed t > 0, with re¬ 
spect to the steady-state distribution TT. We have that 

The OT-step transition matrix can be written, as can be checked by induction, in the 
form 


Tto 7Zi 

A'" 

a -a 

_ TTo TTi 

a + p 

l-P P \ 


pm _ 


where X is the second eigenvalue of P. 

Suppose we require m to be such that the following condition is satisfied 


[F[Z^ = 0\Zo = j] P[Z,„ = l|Zo = ;•]]-[tio ] < [e e] 


( 1 ) 


( 2 ) 


for some £ > 0. For any vector v = [vi V 2 • • ■ v„]^ G K" we use |v| to denote [|vi | |v 2 | 
... |v„|]^, where T is the transposition operator. If cq = [1 0] and ei = [0 1], then for 
j G {0,1} we have that 


[P[Z„, = 0|Zo = ;•] P[Z™ = 1 |Zo = J]] = ejP"’. 
With Q and ([^, condition (|^ can be rewritten as 


(3) 


( 

Tto 

A'" 

a -a 

\ r 1 


Tto 

' a + p 

-P p 

1 - [ Tto Ttl J 


<[££]. 


For 7 = 0 and 7 = 1 the above simplifies to 

A" 


a + P 


a - a 


< [e e] 


and 


a+p 


fi] 


< e e 


respectively. Therefore, it is enough to consider the following two inequalities 


X"’a 


< £ 


a + p 

which, since a,p >0, can be rewritten as 

£(«-!-j3) 


and 


A^jS 


|A'”| < 


a 


and 


a+p 


\n< 


<£, 


£(a-f j3) 
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Equivalently, m has to satisfy 


|1'"| < 


e{a + fi) 


tnax(a,j3) 

By the fact that \X"‘\ = |A|"’ this can be expressed as 

n im . eja + p) 

' ' tnax(a,/3)' 

Then, by taking the logarithm to base 10 on both side^ we have that 

,„.iog(|A|)<i„g(A^ 

and in consequence, since |A| < 1 and log|A| < 0, 

1 / eja+li) \ 

{maxia.p)) 


m > 


log(|A|) 


B Derivation of the sample size 


By the Law of Large Numbers for irreducible positive recurrent Markov chains Z„ — 
7Z\ a. s. with n —> oo, where Z„ = Now, by a variant of the Central Limit 

Theorem for non-independent random variable^ for n large, Z„ is approximately nor¬ 
mally distributed with mean n\ = and asymptotic variance 

Section]^ for the derivation of the asymptotic variance. Let X be the standardised Z„, 
i.e.. 




Z„ TTi 


<Tas/ v/w 


If follows that X is normally distributed with mean 0 and variance 1, i.e., X ~ N{Q, 1). 

Now, we require n to be such that the condition P[;ri — r < < 7Zi + r] = s is 

satisfied for some specified r and s. This condition can be rewritten as 


P[—r < Z„ — ;ri < r] = s, 

and further as _ 

Vn ^ Z„ - TTi ^ ^ Vn 

<Tas <Tas/ v/w C7as 

^ In fact, by the formula for change of base for logarithms, the natural logarithm (In), the loga¬ 
rithm to base 2 (log 2 ), or a logarithm to any other base could be used to calculate m instead of 
log. Notice that m does not depend on the choice of the base of the logarithm! 

Notice that the random variables Zf, Z,^i which values are consecutive states of a trajectory 
are correlated and are not independent. 
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which is 


r 'Jn -v/n, 

[-r - ^ < Y < r - = i. 


Since X ~ Y(0,1) and Y(0,1) is symmetric around 0, it follows that 


•v/n, s 

oJ 2 


and 




Let 0(-) be the standard normal cumulative distribution function. Then the above can 
be rewritten as 

<P(r-^) = ^(l+s). 

Therefore, if we denote the inverse of the standard normal cumulative distribution func- 
tion with ^>^* (•), we have that 

,.^=^-1 (1(1+,)). 


In consequence. 




ap (2-a-j3) 

(a+i3)^ 




2 ■ 


C Derivation of the asymptotic variance 

By the Central Limit Theorem for stationary stochastic processed \/n{Zn — ?ti) 
N{0, (J^j as n —?> oo, where is the so-called asymptotic variance given by 

= Var;i(Z;) -f 2 CoYjc{Zj,Zj+k) (4) 

k=l 

and Var;i(-) and CoV;i(-) denote the variance and covariance with respect to the steady- 
state distribution n, respectively. We proceed to calculate First, observe that E;t(Z„ • 
Z„+i) = — P)- Zn -Zn+i ^ 0 if and only if the chain is state 1 at time n and 

remains in 1 at time n -P 1, i-C-, Z„ = Z„+i = 1. The probability of this event at steady 
state is ^^^(1 — P)- Then, by the definition of covariance, we have that the steady-state 

^ After discarding the ‘bum-in’ part of the trajectory, we can assume that the Markov chain in 
a stationary stochastic process. 
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covariance between consecutive random variables of the two-state Markov chain, i.e., 
CoV;c(Z„,Z„+i) is 


Cov^(Z„,Z„+i) = [(Z„ -E;,(Z„))(Z„+i -E;,(Z„+i))] 


= E,, 


(Z„- 


“ )(Z«+i- “ 


a + p 


a + p 


— E;e 




= E;,(Z„Z„+l) 


j^(E.(Z„) +E,(Z„,)) + 


a(l-j3) ^ 

a + p ~ (a + p)^^ {a + pf 
ap{l-a-p) 

(a + /3)2 ■ 


Further, we have that Var;t(Z„) = tto • Tti = (a+%^ (variance of the Bernoulli distribu¬ 
tion) and it can be shown that CoV;i;(Z„,Z„+j.) = (1 — a — P)^ • Var;r(Z„) for any k > 1. 
Now, according to Equation Q, we have 


= Var;c (Xj)+2j^CoVjc(Xj,Xj+k) 
k=l 


aP 

w+w 


+ 2l^{l-a-Pf 

k=l 


aP 

(a-fj3)2 


aP 

(oT^ 


2aP 

{a + py 


■ Y^ii-a-py 


k=l 


aP 2aP l-a~P 

{a + py~^ {a + py a + p 

aP(2-a-P) 

{a + py ■ 


In consequence, Z„ is approximately normally distributed with mean and variance 
1 aj3(2-a-|3) 
n (a+^P ■ 


D Derivations for the pitfall avoidance heuristics 

We start with analysing the minimum values n(-, •) can attain. The function is consid¬ 
ered on the domain D — (0,1] x (0,1] and, as mentioned before, the estimated values of 
a and p are within the range !]• Computing the partial derivatives, equating them to 
zero, and solving for a and j3 yields a = —P, which has no solution in the considered 
domain. Hence, the function has neither local minimum nor maximum on D. Let us fix 
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j3 for a moment and consider n{a,l5) as a function of a. We denote it as np{a). By 
differentiating with respect to a, we obtain 



1 P (a^--4a+ 2)3) 
^ (a + )3)^ 


where Cr.s 




By equating to zero and solving for a we get two solutions: ai = 2 — — Ip +4 

and a 2 = 2 + — 2)3 +4. Since the second solution is always greater than 1 on the 

(0,1] interval, only the first solution is valid. The sign of the second derivative of rip (a) 
with respect to a at a\ is negative. This shows that for any fixed )3, np (a) grows on the 
interval [—,ai], attains its maximum at ai and decreases on the interval [ai, 1]. Notice 
that n is symmetric, i.e., n{a,l5) =n{l5,a). Thus the minimum value n could attain for 
a and )3 estimated from a sample of size no is given by min 
After evaluating n we get 


and 

Vno no/ 4cr,s \no ) Cr,d-• (1+no)'’ 


Now, to avoid the situation where the initial estimates of a and )3 lead to n{a,P) < 
2 no, it is enough to make sure that given r and s the following condition is satisfied: 
min(n(^, ^),n(^, 1)) ^ 2no. This can be rewritten as 

( (8cr,s- l)no + 1 < 0 

[ 2Cr,sno + 6Cr,sno + (6Cr,i- l)no + 2cr,i+ 1 <0 


Both inequalities can be solved analytically. Given that no > 0, the solution of the first 
inequality is 


«0G 

no S 0 


Cr,s < 8 
Cr,s ^ g- 


(5) 


The solution of the second inequality is more complicated, but can be easily obtained 
with computer algebra system software (e.g.. Maple™). In Table we present some 
solutions for a number of values for r and s. 
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E The Boolean model of apoptosis 



Fig. 2; The wiring of the probabilistic Boolean model of apoptosis originally introduced 

in Ea. 






















































































































